Skip to content

Make is_valid_syntax parts lazy loaded#14

Open
zibertscrem wants to merge 1 commit into
willynilly:mainfrom
venomlab:main
Open

Make is_valid_syntax parts lazy loaded#14
zibertscrem wants to merge 1 commit into
willynilly:mainfrom
venomlab:main

Conversation

@zibertscrem

Copy link
Copy Markdown

This PR is about optimizing performance of the main module import (syntax_helpers one). I believe there are other PRs with the same goal but my fork has been already tested in real environment

Why do I care

So, I develop a big data platform with orchestration based on airflow, and some of the airflow components start painfully slow. Specifically, api server and scheduler. And because of painfully slow startup sometimes on smaller environments I experience constant service restartings after re-deploys cause it cannot make initial start on time (which pisses me off infinitely)

When I performed profiling of the startup of those components - I've noticed that a lot of time was spent inside initial loading of jsonschema package which I believe airflow uses for enforcing their SDK API schema format

Then digging deeper I found that what's actually making things stuck is this package. Airflow spins up N parallel processes and they all go and try to import jsonschema which leads to importing this package N times on pretty small machine and it leads to freezes that cause k8s to kill them and reinitiate

What did I do

I just put everything that made sense under cached functions and added module-level getter for original variables and it shrinked loading time by a lot. I already tested it in my platform and it seems working just fine with jsonschema package. I don't know if any other library uses it differently but I tested what I cared for, the optimized version has been deployed to the dev environment of my platform and worked just fine. I also fixed a typo in RFC3987_SYNTAX_TERMS var (it lacked comma between terms)

I originally thought to put all the names into __all__ but then asterisk import from __init__ actually forces to load everything and it basically nukes all my lazy-loading efforts. Maybe it makes sense to put all the original names somewhere cause RFC3987_SYNTAX_TERMS variable actually lacks some of the function names that were presented in a module

Conclusion

If you want to me to change anything in the lazy load approach - pls let me know, I try to fit it in. Anyways, I have a good opportunity to test if I break anything in real env. I hope you find this PR good enough so we can have an optimized module loading. I believe a lot of people are pissed off because of such very deep dependency's loading time.

Thanks for bringing this library to the world anyways. I made these changes without AI and tested them manually and deployed on real environment. Yeah, it's dev environment but still is pretty real one

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to reduce import-time overhead for rfc3987_syntax.syntax_helpers by lazily initializing the Lark grammar/parser and validator helpers, targeting faster startup for consumers that import the module indirectly (e.g., via jsonschema).

Changes:

  • Lazily load grammar and the main Lark parser via @lru_cache-backed getters.
  • Replace eagerly-created is_valid_syntax_* module globals with module-level __getattr__ that creates validators on demand.
  • Fix a missing comma in RFC3987_SYNTAX_TERMS.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

term = name.removeprefix("is_valid_syntax_")
return make_syntax_validator(term)
msg = f"Attribute {name} not found"
raise AttributeError(msg, name=name)

Copilot AI Apr 30, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raise AttributeError(msg, name=name) is not a valid way to construct AttributeError (it doesn't accept keyword arguments). This will raise a TypeError when an unknown attribute is accessed. Raise AttributeError(msg) (or AttributeError(msg, name) positionally if you want .name set) instead.

Suggested change
raise AttributeError(msg, name=name)
raise AttributeError(msg)

Copilot uses AI. Check for mistakes.
return get_syntax_parser()
if name.startswith("is_valid_syntax_"):
term = name.removeprefix("is_valid_syntax_")
return make_syntax_validator(term)

Copilot AI Apr 30, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__getattr__ currently treats any is_valid_syntax_* attribute as valid and passes the suffix straight into make_syntax_validator(). Combined with @lru_cache(maxsize=None) this allows unbounded creation/caching of parsers for arbitrary attribute names (and can also surface confusing Lark configuration errors for typos). Consider checking term against RFC3987_SYNTAX_TERMS and raising AttributeError for unknown terms.

Suggested change
return make_syntax_validator(term)
if term in RFC3987_SYNTAX_TERMS:
return make_syntax_validator(term)

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +3
__all__ = [
"is_valid_syntax",
]

Copilot AI Apr 30, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are behavior changes here around lazy module attributes (grammar, syntax_parser, is_valid_syntax_*) and around what gets exported at package import time. Existing tests only exercise is_valid_syntax via import rfc3987_syntax as h, so regressions like missing re-exports (parse, RFC3987_SYNTAX_TERMS, is_valid_syntax_iri, etc.) or __getattr__ behavior won't be caught. Add tests that assert these public attributes are available and callable after importing rfc3987_syntax.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +3
__all__ = [
"is_valid_syntax",
]

Copilot AI Apr 30, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defining __all__ as only ['is_valid_syntax'] changes what from .syntax_helpers import * exports. Since src/rfc3987_syntax/__init__.py uses a star import, this will stop re-exporting parse, RFC3987_SYNTAX_TERMS, and the documented is_valid_syntax_* helpers (breaking the public API/README examples). Consider removing the star import from __init__.py and explicitly re-exporting symbols (possibly via package-level __getattr__), or otherwise ensuring the previous public surface remains available without eagerly constructing all validators on import.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants